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Abstract 

We consider the following signal recovery problem: given a measurement matrix $ € W nxp and a 
noisy observation vector c 6 M™ constructed from c = $9* + e where e € M™ is the noise vector whose 
entries follow i.i.d. centered sub-Gaussian distribution, how to recover the signal 9* if DO* is sparse 
under a linear transformation D G M. mxp 7 One natural method using convex optimization is to solve the 
following problem: 
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This paper provides an upper bound of the estimate error and shows the consistency property of this 
method by assuming that the design matrix $ is a Gaussian random matrix. Specifically, we show 1) 
in the noiseless case, if the condition number of D is bounded and the measurement number n > 
^(slog(p)) where s is the sparsity number, then the true solution can be recovered with high proba- 
bility; and 2) in the noisy case, if the condition number of D is bounded and the measurement increases 
faster than s log(p), that is, s log(p) = o(n), the estimate error converges to zero with probability 1 when 
p and s go to infinity. Our results are consistent with those for the special case D = l pxp (equivalently 
LASSO) and improve the existing analysis. The condition number of D plays a critical role in our analy- 
sis. We consider the condition numbers in two cases including the fused LASSO and the random graph: 
the condition number in the fused LASSO case is bounded by a constant, while the condition number 
in the random graph case is bounded with high probability if ^ (i.e., J^g^ ) is larger than a certain 
constant. Numerical simulations are consistent with our theoretical results. 



1 Introduction 



The sparse signal recovery problem has been well st udied recently from the 
aspe ct in many areas including compress i ve sen sing [Can des and Plan . 2009, 



tics | Ravikumar et al. , 2008 . Bunea et all 2007 . Koltchinskii and Yuan 



t heory aspect to the app lication 



2008 



Candes and TaoLl2007h. statis 



Lounici. 20081. Meinshausen et al.. 



200611 . machine learning 



nal processing IIRomberg 



Zhao an d YuL 120061. IZhand l2009bl IWainwrightL 120091 iLiu et all I2012I1. and sig 



2008, Donoho et al. 



20061. lzhangLl2009ah . The key idea is to use the t\ norm 



to relax the £q norm (the number of nonzero entries). This paper considers a specific type of sparse signal 
recovery problems, that is, the signal is assumed to b e sparse under a linear transformation D. It includes 
the well-known fused LASSO IITibshirani et all 1200511 as a special case. The theoretical property o f such 
problem has not been well understood yet, although it has achieved suc cess in many applications jChanl . 
1998 . Tibshirani et al. , 2005 . Candes et al. , 20061 Sharpnack et al. , 2012 1. Formally, we define the problem 
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as follows: given a measurement matrix $ G M nxp (p 3> n) and a noisy observation vector c£K" con- 
structed from c = <56>*+e where e G R n is the noise vector whose entries follow i.i.d. centered sub-Gaussian 
distributiorQ, how to recover the signal 9* if DO* is sparse where D G M mx 'P is a constant matrix dependent 
on the specific application^]? A natural model for such type of sparsity recovery problems is: 



min : — ||<I>c? — c| 
9 2" ' 



\\\D0\ 



(1) 



The least square term is from the sub-Gaussian noise assumption and the second term is due to the spar- 
sity requirement. Since this combinatorial optimization problem is NP-hard, the conventional i\ relaxation 
technique can be applied to make it tractable, resulting in the following convex model: 



min : — ll^^ 
9 2" 



c|r + A||D0||i. 



(2) 



Such model includes many well-known sparse formulations as special cases: 



• The fused LASSO IITibshirani et all |2005[ Friedman et all 1200711 solves 



mm : -||<J>0 

9 2" 



cf + Ai||0||i + A 2 ||Q0|b 



(3) 



where Q G R^ p ^ xp is defined as Q = [I(p_i) x (p-i); p _i] - [0 p _i; I(p_i) X (p-i)]. that is > 



Q 
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One can write Eq. (0) in the form of Eq. © by letting A = 1 and D be the conjunction of the identity 
matrix and the total variance matrix, that is, 



D 



Ailpxp 
A 2 Q 



dimensional changing point detection problem I Candes et al . 2006ll can be expressed 

c- ■ ■ ) 2 + 

(4) 
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min : — 
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The general K 
by 

^ ] K a h,i2,--- ,%k 
h-lh-l Ik-1 

y 1 y > (l^ii)ia)—|ifc ~ ^ii+l,»2,— 

il=l 12=1 »K=1 

where 6* G R^ix^x-jk j s a 7^ dimensional tensor with a stepwise structure and S is the set of indices. 
The second term is used to measure the total variance. The changing point is defined as the point where 

removed; see Zhang I 2009ah . For simplification of analysis, we enforce 



'Note that this "identical distribution" assumption can be 
; condition throughout 
2 We study the most g 



this condition throughout this paper. 

general case of D, and thus our analysis is applicable for both m > p or m < p. 
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the signal changes. One can properly define D to rewrite Eq. dD in the form of Eq. ©. In addition, if 
the structure of the signal is piecewise constant, then one can replace the second term by 



/1-I/2-I I K -1 

ii=2 12=2 ipc=2 
+ • • • + 120^1^... ^ — 9i lt i 2 ,— ,i K +1 ~~ ^ii,i2,— — 1 1) - 

It can be written in the form of Eq. (J2j as well. 

The second term of (@J, that is, the total variance, is defined as the sum of differences between two 
neighboring entries (or nodes). A graph can generalize this definition by using edges to define neigh- 
boring entries rather than entry indexes. Let G(V, E) be a graph. One has 



l\\$e - c\\ 2 + \ (5) 



where ^2(ij)eE 1^ ~~ %l defines the total variance over the graph G. The k th edge between nodes i 
and j coiTesponds to the k th row of the matrix D G Rl^l*!^! wi th zero at all entries exc ept D^i = 1 



and Dkj = —1. Taking $ = I pxp , one obtains the edge LASSO BSharpnack et al.L I2012TI . 



This paper studies the theoretical properties of problem © by providing an upper bound of the estimate 
error, that is, \\6 — 6*\\ where 9 denotes the estimation. The consistency property of this model is shown by 
assuming that the design matrix $ is a Gaussian random matrix. Specifically, we show 1) in the noiseless 
case, if the condition number of D is bounded and the measurement number n > Q(s log(p)) where s is the 
sparsity number, then the true solution can be recovered under some mild conditions with high probability; 
and 2) in the noisy case, if the condition number of D is bounded and the measurement number increases 
faster than slog(p), that is, n = 0(s log(p)), then the estimate error converges to zero with probability 1 
under some mild conditions when p goes to infinity. Our results are consist ent with those for th e special case 



D = Ipxp (equivalently LASSO) and improve the existing analysis in BCandes et all I2011L IVaiter et al. 



2012]. To the best of our knowledge, this is the first work that establishes the consistency properties for 
the general problem ©. The condition number of D plays a critical role in our analysis. We consider the 
condition numbers in two cases including the fused LASSO and the random graph: the condition number in 
the fused LASSO case is bounded by a constant, while the condition number in the random graph case is 
bounded with high probability if — (that is, J^j^ ) is larger than a certain constant. Numerical simulations 
are consistent with our theoretical results. 

1.1 Notations and Assumptions 

Define 

+ n M ll^ll 2 - n M - Il^ll 2 

P^ Y Kh,h)= max , p = mm , 

heR l i xH(Y,l 2 ) \\h\\ ' heR l i xH(Y,l 2 ) \\h\\ 

where l\ and I2 are nonnegative integers, and 1-LiY, I2) is the union of all subspaces spanned by I2 columns 
of Y: 

H{Y,l 2 ) = {Yv\ Ho<Z 2 }. 
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Note that the length of h is the sum of l\ and the dimension of the subspace % ( which is in general not e qual 
to I2). The definition of p^, Y (U > h) and y(^i, £2) combines the idea of RIP JCandes and Tao . 2005] and 



D-RIP llCandes et all 1201 111 . One can verify that if \& satisfies the RIP condition in terms of the sparsity l\, 
then we have RIP constant = max{p^ Y (h, 0) — 1,1 — p^, Y (h, 0)}. Similarly, we have D-RIP constant 
= max{p^ Y (0, 12) — 1, 1 — y(0, £2)} if ^ satisfies the D-RIP condition in terms of the sparsity I2 and 
the dictionary Y. Denote p^ Y (0, I2) and y (0, Z2) as p~q, Y (h) and Y (h) respectively for short. 

Denote the compact singular value decomposition (SVD) of D as D = UUVT. Let Z = UT, and 
its pseudo-inverse be Z + = Y 1 ~ 1 U T . One can verify that Z + Z = /. a m - m (D) denotes the minimal 
nonzero singular value of D and a max (D) denotes the maximal one, that is, the spectral norm ||-D||. One has 
(Tmm{D) = cr mm (Z) = cr m l x (Z + ) and cr max (D) = o"ma X (Z) = cr mm (Z+). Define 

"'max "max 



K := 



1 (D) "min (Z) 



Let To be the support set of DO*, that is, a subset of {1, 2, • • • , m}, with s := |To|. Denote Tq as its 
complementary index set with respect to {1, 2, • • • , m}. Without loss of generality, we assume that D does 
not contain zero rows. Assume that c = + e where e £ W 1 and all entries ej's are i.i.d. centered sub- 
Gaussian random variables with sub-Gaussian norm A (Readers who are not familiar with the sub-Gaussian 
norm can treat A as the standard derivation in Gaussian random variable). In discussing the dimensions of 
the problem and how they are related to each other in the limit (as n and p both approach 00), we make 
use of order notation. If a and (3 are both positive quantities that depend on the dimensions, we write 
a = 0(/3) if a can be bounded by a fixed multiple of (3 for all sufficiently large dimensions. We write 
a = o(/3) if for any positive constant <fi > 0, we have a < (j)f3 for all sufficiently large dimensions. We 
write a = £1(0) if both a = 0(j3) and /3 = 0(a). Throughout this paper, a Gaussian random matrix means 
that all entries follow i.i.d. standard Gaussian distribution M(0, 1). Denote the l<x, t 2 norm of Q G K mxn as 
||Q||oo,2 = max je{i,— ,n} HQ? II where Qj is the j th column of Q. 

1.2 Related Work 



Candes et al.1 1201111 proposed the following formulation to solve the problem in this paper: 

min:||D#||i s.t. : \\$>9 — c\\ < e, (6) 

where D £ W nx P is assumed to have orthogonal columns and e is taken as the upper bound of ||e||. They 
showed that the estimate error is bounded by CqE + C\ \ (DQ*)t<\\I \f\T\ with high probability if ^/n^ G 
W ixp is a Gaussian random matrixH with n > £l(slogm), where Co and C\ are two constants. Letting 
T = Tq and e = || e || , the error bound turns out to be Cq 1 1 e || . This result shows that in the noiseless case, with 
high probability, the true signal can be exactly recovered. In the noisy case, assume that ej's (i = 1, • • • , n) 
are i.i.d centered sub-Gaussian random variables, which implies that ||e|| 2 is bounded by £l(n) with high 
probability. Note t hat since the measure ment matrix <I> is scaled by 1/ y/n from the definition of "Gaussian 



random matrix" in llCandes et all 1201111 . the noise vector should be corrected similarly. In other words, \\e 



should be bounded by £1(1) rather than £l(n), which implies that the estimate error in llCandes et all 1201 lh 
converges to a constant asymptotically. 



3 Note that the "Gaussian random matrix" defined in llCandes et al.Ll20lHl is slightly different from ours. In llCandes et all 201 lh 



<E> £ R " x p is a Gaussian random matrix i f each entry of $ is g enerated from A/"(0, 1/n) . Please refer to Section 1 .5 in | Candes et all 
|20 1 lh . Here we only restate the result in jCandes et alll201 ltl by using our definition for Gaussian random matrices. 
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Nama et al. 



|2012] considered the noiseless case and analyzed the formulation 



mm 



\D0\ 



s.t. : $0 



(7) 



assuming all rows of D to be in the general position, that is, any p rows of D are linearly independent, which 
is violated by the fused LASSO. An sufficient condition was proposed to recover the true signal 9* using the 
cosp arse analys is. 



IVaiter et all 1120 1211 also considered the formulation in Eq. (0 but mainl y gave robustness analysis for 
this model using the cosparse technique. A sufficient condition [different from lNama et al.l Il2012ll l to exactly 
recover the true signal was given in the noiseless case. In the noisy case, they took A to be a value propor- 
tional to ||e|| and proved that the estimate error is bounded by OQ | e||) under certain conditions. They did 
not consider the Gaussian ensembles for <I>; see I Vaiter et all 120121. Section 3.B]. We applied the Gaussian 
ensembles to their result, but did not see the consistency property, see Appendix for details. 

The fused LASSO, a special c ase of Eq. ( El), w as also studied recently. The sufficient condition of 
dete cting jumping points is given by lKolar et all 1200911 . A special fused LASSO formulation was considered 
by Rinaldo [2009] in which <& was set to be the identity matri x and D to be the combination of the identity 
matrix and the total variance matrix. Isharpnack et al. 1 2012 1 proposed and studied the edge LASSO by 
letting <1> be the identity matrix and D be the matrix corresponding to the edges of a graph. 



1.3 Organization 

The remaining of this paper is organized as follows. To build up a uniform analysis framework, we simplify 
the formulation © in Section [2] The main result is presented in Section [3] Section 0] analyzes the value 
of an important parameter in our main results in two cases: the fused LASSO and the random graph. The 
numerical simulation is presented to verify the relationship between the estimate error and the condition 
number in Section [5] We conclude this paper in Section [6] All proofs are provided in Appendix. 



2 Simplification 



As highlighted by IVaiter et al. J2012ll . the analysis for a wide D G R mx P (that is, p > m) significantly 
differs from a tall D (that is, p < m). To build up a uniform analysis framework, we use the singular value 
decomposition (SVD) of D to simplify Eq. (|2]), which leads to an equivalent formulation. 

Consider the compact SVD of D: D = UY.VJ where U G M nxr , E G W xr (r is the rank of D), and 
V/3 G W xr . We then construct V a G M?*^-^ such that 



V := [ V a V/3 



is a unitary matrix. Let /3 = Vj6 and a = Vj9. These two linear transformations split the original signal 
into two parts as follows: 



mm : — 

ocS 2 

1 

= 2 



$ [V a Vp] 

[$V a <5>Vp) 



VI' 

n - 


e- 


a 


— c 



+ x\\uEVf e\\! 



=-\\Aa + B/3 



+x\\um\i 
+M\m\i 



(8) 

(9) 
(10) 
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where A = <&V a G R nx d>- r \ B = $Vp £ R nxr , and Z = C7S E R mxr . Let d,/3 be the solution of 
Eq. (fTOl) . One can see the relationship between a and (3: a = —(A T A)~ l A T (B(3 — c)0 which can be used 
to further simplify Eq. ([Tol l: 

nun : f(J3) := - A(.A t A) -1 j4. t )(.B/J - c)|| 2 + A||Z/3||i. 

Let 

X = (J - A(A T i4) -1 A r )B 

and 

y = (I — A(A T A)- 1 A T )c. 
We obtain the following simplified formulation: 

min : /(/?) = h\X/3 - y\\ 2 + X\\Z/3\\i, (11) 
p & 

where X G M nxr and Z G M mxr . 

Denote the solution of Eq. © as and the ground truth as 9*. One can verify 9 = V[a T /3 T ] T . Define 
a* := V^9* and /3* := Vj#*. Note that unlike a and /3 the following usually does not hold: a* = 

-(A T A)- 1 A T (B(3* - c). Let /i = /3 - /3* and d = a - a*. We will study the upper bound of ||0 - 9*\\ in 
terms of \\h\\ and \\d\\ based on the relationship \\9 - 9*\\ < \\h\\ + ||d||. 

3 Main Results 

This section presents the main results in this paper. The estimate error by Eq. (|2), or equivalently Eq. (fTTt . 
is given in Theorem [T] 

Theorem 1. Define 

W X h,i -■=P x ,z+( s + l )i 
W Xh ,2 :=Qa m l n (Z)p+ z+ (s + 

W d ,l :=^a-l(A T A)(p + (s + l + p-r)-p-{s + l+p-r)), 
W d ,2 :=la m l(A T A)a m l n (Z)y r 7/l(p + (l + p- r) - p~ (I + p - r)), 

(Z) - 3y / 7Jla max (Z) 
W h :=3yG/la^ n (Z), 

where p + (p — r, .) and p~(p — r, .) denote p^ A ^ z+ (p — r, .) and p^ A ^ z+ (p — r, .) respectively for short. 
Taking A > 2||(Z+) T X T e|| 00 in Eq. ©, we have if A T A is invertible (apparently, n > p — r is required) 
and there exists an integer I > 9k 2 s such that Wxh,i — Wxh^Wp > 0, then 

\\9-9*\\ < W B V~s\+ \\(A T A)- 1 A T e\\ (12) 

4 Here we assume that A T A is invertible. 
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where 

(1 + W d , x )W a + (W h + W d ^)Wl 



Wg = 6- 



WxhA - W Xh oW a 



One can see from the proof that the first term of ([TZi is mainly due to the estimate error of the sparse 
part j3 and the second term is due to the estimate error of the free part a. 

The upper bound in Eq. (fT2l strongly depends on parameters about X and Z + such as p^ z+ {-), 

Px z+(")' ^ + ('' ')> an d P (•) •)■ Although for a given $ and D, X and Z + are fixed, it is still challeng- 
ing to evaluate these parameters. Similar to existing literature like llCandes and Taol 2005 1. we assume $ to 



be a Gaussian random matrix and estimate the values of these parameters in Theorem [2] 

Theorem 2. Assume that <I> is a Gaussian random matrix. The following holds with probability at least 
1 - 2exp{-fi(fclog(em/fc))}: 



Px,z+ (k) < \/n + r -p + n (y/k log(em/fc)) (13) 
y 1 P~x,z+( k ) > Vn + r - p -Q (Vklog(em/A;)) . (14) 



The following holds with probability at least 1 — 2 exp{— f2 {(k + p — r) log(ep/ (k + p — r)))}: 



PtA,B],z+(P~ r ' k ) - Vn + ^i^/ {k + p - r)log(ep/ (k + p - r))) (15) 

V 1 P[A,B},z+(P~ r i k ) - Vn-^(V( k + P- r ) l °s(ep/(k+p-r))). (16) 

Now we are ready to analyze the estimate error given in Eq. (flTI ). Two cases are considered in the 
following: the noiseless case e = and the noisy case e / 0. 

3.1 Noiseless Case e = 

First let us consider the noiseless case. Since e = 0, the second term in Eq. (fT2l vanishes. We can choose 
a value of A to make the first term in Eq. (fT2l arbitrarily small. Hence the true signal 0* can be recovered 
with an arbitrary precision as long as W$ > 0, which is equivalent to requiring Wxh,i — Wxh,2W a > 0. 
Actually, when A is extremely small, Eq. §2} approximately solves the problem in Eq. (JT]) with e = 0. 

Intuitively, the larger the measurement number n is, the easier the true signal 6* can be recovered, 
since more measurements give a feasible subspace with a lower dimension. In order to estimate how many 
measurements are required, we consider the measurement matrix $ to be a Gaussian random matrix (This is 
also a standard setup in compressive sensing.). Since this paper mainly focuses on the large scale case, one 
can treat the value of I as a number proportional to k 2 s. 

Using Eq. (fT3l and Eq. (fT4l . we can estimate the lower bound of Wxh,i — Wxh,2W a in LemmaQ] 

Lemma 1. Assume & to be a Gaussian random matrix. Let I = (10k) 2 s. With probability at least 1 — 
2 exp{— Q((s + I) log(em/(s + I)))}, we have 



W Xh ,l ~ W Xh , 2 W a > j(n + r - p) - O L (n + r - p)(s + I) log (-^) j . (17) 
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From Lemma Q] to recover the true signal, we only need 

(n + r -p) > 0((s + I) log(em/(s + /))). 
To simplify the discussion, we propose several minor conditions first in Assumption Q] 
Assumption 1. Assume that 

• p — r < (jm (<p < 1) in the noiseless case and p — r < Q(s) in the noisy caseQ 

• the condition number n 



Omax(-D) 
O-min(-D) 



ggfg(g) is bounded; 



• m = Q(p l ) where i > 0, that is, m can be a polynomial function in terms of p. 

One can verify that under Assumption [T] taking I = (10k) 2 s = Q(s), the right hand side of (TTTT ) is 
greater than 



O(n) — fi(\/ns log(em/s)) = f2(n) — 0(i/ns log(ep/s)). 
Letting n > 0(s log(ep/s)) [or f2(slog(em/s)) if without assuming m = Q(p i )], one can have that 

W Xh ,i ~ W Xh>2 W a > O(n) - fi( v / nslog(ep/ S )) > 

holds with high probability (since the probability in Lemma Q] converges to 1 while p goes to infinity). 
In other words, in the noiseless case the true signal can be recovered at an arbitrary precision with high 
probability 



To compare with existin g results, we conside r two special cases: D 



L pxp 



I Candes and Taol 2005 1 and 



D has orthogonal columns llCandes et all 1201 ill, that is, D T D = I. W hen D = I pxp and $ is a Gaussian 
random matrix, the required measurements in I Candes and Taol 2005 1 are 0(s log(ep/s)) , which is the 
same as ours. Also note that if D = I pxp , Assumption Q] is satisfied automatically. Thus our result does 
not enforce any additional condition and is consistent with existing analysis for the speci al case D 



L pxp- 



Next we consider the case when D has orthogonal columns as in llCandes et all 1201 111 . In this situation, 
all conditions except m = Q(p l ) in Assumption Q] are satisfied. One can easily verify that the required 
measurements to recover the true signal are Q(s logfem/s)) with out assuming m = from our analysis 
above, which is consistent with the result in llCandes et all 1201 111 . 



3.2 Noisy Case e ^ 

Next we consider the noisy case, that is, study the upper bound in (fT2l while e / 0. Similarly, we mainly 
focus on the large scale case and assume Gaussian ensembles for the measurement matrix <i>. Theorem [3] 
provides the upper bound of the estimate error under the conditions in Assumption [T] 

Theorem 3. Assume that the measurement matrix is a Gaussian random matrix, the measurement satisfies 
n = 0(s logp), and Assumption\l]holds. Taking A = C||(Z + ) T X T e|| 00 with C > 2 in Eq. (O, we have 



e*\\ < n 



' s log p 
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(18) 



with the probability at least 1 — Q(p 1 ) — Q(m 1 ) — s log(ep/s)). 



5 This assumption indicates that the free dimension of the true signal 6* (or the dimension of the free part a G R p ~ r ) should not 
be too large. Intuitively, one needs more measurements to recover the free part because it has no sparse constraint and much fewer 
measurements to recover the sparse part. Thus, if only limited measurements are available, we have to restrict the dimension of the 
free part. 
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p = 50, n = 40 p = 150, n = 100 p = 300, n = 200 




1 2 5 10 20 SO 100 1000 1 2 5 10 20 50 100 1000 1 2 5 10 20 50 1 00 1000 



Condition Number of D Condition Number of D Condition Number of D 

Figure 1: Illustration of the relationship between condition number and performance in terms of relative 
error. Three problem sizes are used as examples. 



One can verify that when p goes to infinity, the upper bound in (1181 ) converges to from n = 0(s logp) 
and the probability converges to 1 due to m = £l(p l ). It means that the estimate error converges to 
asymptotically given the measures n = O(slogp). 

This result shows the consistency property, that is, if the measurement number n grows faster than 
s log(p), the estimate error will vani s h. This consistency p roperty is consistent with the special case LASSO 



by taking D = I pxp I Zhang . 2009all . Candes et all 1201 ill considered Eq. © and obtained an upper bound 



for the estimate error VL{e/^Jn) which does not guarantee the consistency property like ours since e = 
e ||) = U(y/n). Their result only guarantees that the estimation error bound converges to a constant 
given n = O(slogp). 

In addition, from Eq. ©, one can simply verify that the boundedness requirement for k can actually be 
removed, if we allow more observations, for example, n = 0(k a s logp). Here we enforce the boundedness 
condition just for simplification of analysis and a convenient comparison to the standard LASSO (it needs 
n = O(slogp) measurements). 

4 The Condition Number of D 



Since k is a key factor from the derivation of Eq. d6), we consider the fused LASSO and the random graphs 
and estimate the values of k in these two cases. 

Let us consider the fused LASSO first. The transformation matrix D is 



[I 



(p-l)x(p-l) 0p-l 



[ o P - 



1 Hp-1)x( P -1) 



*-pxp 



One can verify that 



Cmin(-D) = min \\Dv\\ > min \\v\\ = 1 
||ii||=l Mull = l 



and 



<7max(-D) = max \\Dv 



v\\=l 



< max || [ I(p_i)x(p-i) p _i ] v - [ p _i I( p _i) x (p-i) ] v\\ + \\v\ 



< max || [ I(p_i) x (p_i) 



+ 11 [0; 



p-1 i (p-l)x(p-l) 



\v\\ + \\v\ 



<3 
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which implies that <r m \ n (D) > 1 and a max (D) < 3. Hence we have k < 3 in the fused LASSO case. 

Next we consider the the random graph. The transformation matrix D corresponding to a random graph 
is generated in the following way: (1) each row is independent of the others; (2) two entries of each row are 
uniformly selected and are set to 1 and —1 respectively; (3) the remaining entries are set to 0. The following 
result shows that the condition number of D is bounded with high probability. 

Theorem 4. For any m and p satisfying that m > cp where c is large enough, the following holds: 

with probability at least 1 — 2 exp{— £l(p)}. 
From this theorem, one can see that 

• If m = cp where c is large enough, then 

(D) 
(D) 

is bounded with high probability; 

• If rn = p(p — 1) /2 which is the maximal possible m, then k — > 1. 



5 Numerical Simulations 

In this section, we use numerical simulations to verify some of our theoretical results. Given a problem size 
n and p and condition number k, we randomly generate D as follows. We first construct a p x p diagonal 
matrix Dq such that 

.„ . max(Diag(Z)n)) 
Diag (D ) > and . 1 » = k. 

mm(Diag(A))) 

We then construct a random basis matrix V G M pxp , and let D = DqV. Clearly, D has independent 
columns and the condition number equals to k. Next, a vector x G W° is generated such that Xj ~ A/"(0, 1), 
i = 1, . . . , ^ and Xj = 0, j = ^ + 1, . . . ,p. 9* is then obtained as 9* = D~ x x. Finally, we generate a 
matrix $ G R nxp with $y ~ 7V(0, 1), noise eel" with e« ~ AT(0, 0.001) and y = $6»* + e. 

We solve Eq. (J2J) using the standard optimization package CVX^and A is set as A = 2|| (Z + ) T X T e\\ 00 
as suggested by Theorem Q] We use three different sizes of problems, with n G {40,100,200}, p G 
{50, 150, 300} and k ranging from 1 to 1000. For each problem setting, 100 random instances are gen- 

11^ Q* II 

erated and the average performance is reported. We use the relative error 11 11 for evaluation, and present 
the performance with respect to different condition numbers in Figure Q] We can observe from Figure [T]that 
in all three cases the relative error increases when the condition number increases. If we fix the condition 
number, by comparing the three curves, we can see that the relative error decreases when the problem size 
increases. These are consistent with our theoretical results in Section 3 [see Eq. ©]. 



Icvxr . com/cvx/| 
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6 Conclusion and Future Work 



This paper considers the problem of estimating a specific type of signals which is sparse under a given linear 
transformation D. A conventional convex relaxation technique is used to convert this NP-hard combinatorial 
optimization into a tractable problem. We develop a unified framework to analyze the convex formulation 
with a generic D and provide the estimate error bound. Our main results establish that 1) in the noiseless 
case, if the condition number of D is bounded and the measurement number n > 0(slog(p)) where s is 
the sparsity number, then the true solution can be recovered with high probability; and 2) in the noisy case, 
if the condition number of D is bounded and the measurement number grows faster than slog(p) [that is, 
slog(p) = o(n)], then the estimate error converges to zero when p and s go to infinity with probability 1. 
Our results are consistent with existing literature for the special case D = I pxp (equivalently LASSO) and 
improve the existing analysis for the same formulation. The condition number of D plays a critical role in 
our theoretical analysis. We consider the condition numbers in two cases including the fused LASSO and 
the random graph. The condition number in the fused LASSO case is bounded by a constant, while the 
condition number in the random graph case is bounded with high probability if — (that is, J^Jj^ ) is larger 
than a certain constant. Numerical simulations are consistent with our theoretical results. 
In future work, we plan to study a more general formulation of Eq. ©: 

min: f(6) + \\\D9\\i, 
6 

where D is an arbitrary matrix and f{9) is a convex and smooth function satisfying the restricted strong 
convexity property. We expect to obtain similar consistency properties for this general formulation. 
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Appendix 



A. Proof of Theorem [J 

Before present proofs, we first introduce several important definitions used. We divide the complementary 
index set Tq := {1, 2, m}\To into a group of subsets Tj's (j = 1, 2, • • • , J), without intersection, such 
that T\ indicates the index set of the largest I entries of Z^h (in the absolute value), T 2 contains the next- 
largest I entries of Z^h, and so forthH 

First we give the proof skeleton of Theorem Q] Recall that the estimate error \\6 — 9* || is bounded by the 
sum of the free part error ||<i|| (that is, ||d — a*\\) and the sparse part error \\h\\ (that is, \\(3 — /3*||). Lemma|7] 
and Lemma [8] studied the upper bound of \\h\\ and \\d\\ respectively and the proof of Theorem [T] makes use 
of these two upper bounds. 

Assumption 2. Assume that 

\\(Z + ) T X T (XP* - y)^ = \\(Z+fX T e\\ 00 < A/2. (19) 
Lemma 2. Assume that Assumption\2\holds. We have 

3\\Z To h\\i > WZtsHl 
Proof. Since (5 is the optimal solution of Eq. (fTTT> . we have 

o > \\\xp - y\\ 2 - \UF - vt + KWh - \\zn\i) 



X 



X/ l + A(||^||i-||Z/3*|| 1 ) 



= [X (/T + h/2) - yf Xh + \(\\ZP\\i - \\Zp\k) 

> [xp* -yfxh + xqzpu-wzpWt) 

> h T Z T (Z + ) T X T {XI3* -y) + KWZtJU - \\Zt P\\ x + fe^Hi - \\Z T ^*\ 



> 



WZhUiz+y x 1 (xp* - y)\\oo + HWZibPh - \\Zt P*\\i + \\z T §P\\i - \\z T§ P*h) 



> -||^||iA/2 + A(||Z To ^||i - ||Z To ri|i + ||^/3||i) (from Assumption E]) 

> -(\\Z To h\\i + ll^||i)A/2 + AHI^Ui + II^Hi) 

yhwzTchh-hwzrMi- 

It completes the proof. □ 

Lemma 3. For any matrices P, Q, Z, and X with compatible dimensions and k,l% > 0, we have 

\\P T Qv\\ 1 / \ 
vmzte) \\v\\ -2 fe^ 1 h) ~ p \P,Q]M k > ^ ) (20a) 

where k denotes the number of columns of P. 



7 The last subset may contain fewer than I elements. 
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Proof. The claim follows from 

\P T Qv\\ 



max 



veH(z,i 3 ) \\v\ 



\u T P T Qv\ 



max 

ue«. k ,ven(z,i 3 ) \\u\\\\v\ 



max \u T P T Qv\ 

\\u\\=l,\\v\\=l,u<EM. k ,v£H(Z,l 3 ) 



1 

max — 

\\u\\=i,\\v\\=i,ueR k ,ven(z,i 3 ) 4 



1 



[P,Q] 



u 

V 



< max 7 % i#.'3) 

||u|| = l, \\v\\=l,u<m. k ,v&V.(Z ,h) 4 V rW'* 



u 

V 



[P,Q] 

2 



P[P,Q],Z^) 



< 



1 



P[p,Q],zft> h) P[p,Q],zft)h) 



Lemma 4. Assume that Assumption\2\holds. We have 



Ell^H <3^7Jl\\Z To h\\- 

i>2 



Proof. From the LHS, we have 



j>2 J>2 



<Ev /(I|Zt - i/i||i//)2 
<ll^/»lliM 

<3||2y /i||i/v7 (from Lemma |2]) 



It completes the proof. 

Lemma 5. Assume that Assumption\2\holds. We have 

\\xhf > w XhA \\z+ oi z Toi h\\ 2 - w Xh , 2 \\z+ n z Toi h\\\\ZT M\ 

where Wxhi an d Wxh2 are defined in TheoremUl 
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Proof. The inequality is derived from 

\\Xhf = \h T Z T {Z + ) T X T XZ + Zh\ 

>h T Zl i (Z+J T X T XZ+ oi Z Toi h -2j2\h T Zl 1 (Z+J T X T XZ+Z Tj h\ 

i>2 

>h T Z^ 1 (Z+J T X T XZ+ i Z Toi h - 2\\xz+ oi z Toi h\\ E W xz ^ z tM\ 

i>2 



>^x,z+( s + l)\\Z+ oi Z Toi h\\ 2 - 2y/p+ t z+ {s + l)^ P+^mZ^ZToMYs^T^TM 

i>2 

i>2 

>p XiZ+ ( S + OH^ 01 ^To^H 2 - 6p+ + Oll^^rbi^lk^aC^II^^II/V^ ( from Lemma® 
^^+( S + OII^ 1 ^T 01 ^I| 2 - 6cr mi 1 n (^)p+^ + ( S + Z)||^+ i ^ Toi /i||||^ roi ^|| x A7z. 

It completes the proof. □ 

Lemma 6. Assume that Assumption\2\holds. We have 

\\Xh\\ 2 <6^X\\Z Toi h\\. (23) 

Proof. From the optimality condition, we have that there exists g satisfying Z T g G <9||Z/3||i and ||g||oo < 1 
such that 

X T (X/3 - y) = -XZ T g 
=>X T (I - A(A T A)- 1 A T )(B(3 - c) = -\Z T g (due to the definition of X) 
^X T (I - A(A T A)~ 1 A T )(B/3 - Aa* - B(3* - e) = -\Z T g 
^X T (I - A(A T A)~ 1 A T )(B/3 - B(3* - e) = -\Z T g 
^X T {X(P - (3*) -e) = -\Z T g 
=>X T Xh = -XZ T g + X T e 
^h T X T Xh = -Xh T Z T g + h T X T e 
^\\Xh\\ 2 < XWZhWMoo + \\Zh\\ 1 \\(Z + fX T e\ 



3,„„, „ .3 



^\\Xh\\ 2 < -X\\Zh\\ x < -X(\\Z To h\\i + II^Hi) < GM\Z To h\\i < sV~sX\\z Toi h\\- 
It completes the proof. □ 
Lemma 7. Assume that Assumption\2\holds. We have 

\\d\\ =11" - «*|| < W d!l \\Z+ oi Z Toi h\\ + W d , 2 \\Z To h\\ + \\(A T A)' 1 A T e\\, 
where Wd i and W d 2 are defined in Theorem\j\ 
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Proof. Noticing that a = -(A T A)~ 1 A T (B^ - c), we have 

a = -(A T A)- 1 A T (B(3 - Aa* - B/3* - e) 
^a-a* = -(A T A)- 1 A T (B(/3 - (3*) - e) 
= -(A T A)- 1 A T (Bh - e) 

It follows that 

Nl < a£(A T A)\\A T Bh\\ + \\(A T A)- 1 A T e\\. (24) 
Consider ||A T i?/i|| as follows: 



\A T Bh\\ =\\A T {BZ+ oi Z Toi + Y,BZ+Z Tj )h\\ 



i>2 



<\\A T BZ+ n Z Toi h\\ + ^ U T BZ+Z T] h\\ 

i>2 

<\(~P + {p -r,s + l)-p-(p-r,s + /))||Z+ i Z Toi / l || + 
^(p + {p-r,l) - p-(p-r,l))^2\\Z+Z Tj h\\ 
1 

^J^l(p+(p-r,l)-p-(p-r,l))Y,\\ZTM 

i>2 

<\{p + {p -r,s + l)-p-(p-r,s + l))\\Z+ oi Z Toi h\\ + 



<-(p+(p - r , s + l)-p-(p-r,s + l))\\Z+ oi Z Toi h\\+ 



(p + (p -r,l)- p-(P ~ r, l))\\Z To h\\y/7/i. 



2 

The last inequality is due to LemmalU Plugging it into Eq. (|24l) . we obtain the claim. □ 
Lemma 8. Assume that Assumption\2\holds. For any I > (3k) 2 s, we have that 

\\h\\ <\\Z+ Qi Z Toi h\\ + W h \\Z Tol h\\; (25a) 
\\Z Toi h\\ <W a \\Z+ oi Z Toi h\\; (25b) 

w Xhyl -w Xhy2 w a ' 



< T „„ , rrg T. w ; (250 



llZToM -w Xhtl -w Xh w (25d) 

where W a and Wh are defined in Theorem\J} 
Proof. The first inequality is obtained from 

= \\Z+Zh\\ < ||Z+ oi Z roi ^| + £||Z+Z T .fc|| < W Z T 01 Z T 01 H +^V^lcT^ a (Z)\\Z Toi hl 

i>2 
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It follows that 

*^L{Z)\\Z Tol h\\ < a m UZ + )\\Zh\\ < \\Z + Zh\\ < \\h\\ < \\Z+ oi Z Tol h\\+^^l^(Z)\\Z Toi h\\ 
^\\Z Tm h\\ < (a~UZ)-^V^^ a (Z))-^Z+ oi Z Toi h\\ 

^\\z To M < a .;f Z ^ Z) (7 A \z} 01 z Toi h\\ = wjz+ n z Toi h\\ 

which implies the second inequality. The third inequality is satisfied automatically if \\Z£ oi ZT 01 h\\ = 0. We 
only need to prove the situation ||Z^ oi ZT 01 fo|| 0: 

W XK1 \\Z+ oi Z Toi hf - W XK2 \\Z Toi h\\\\Z^ x Z Tm h\\ < 6V~s\\\Z Tol h\\ < ^sW a \\\Z+ m Z Tol h\\ 
^W XhA \\Z+ oi Z Toi h\\ - W xh>2 \\Z Toi h\\ < Q^W a X 
^W Xh>1 \\Z+ oi Z Tol h\\ - W a W Xh>2 \\Z+ oi Z Toi h\\ < 6\/~sW a \ 

M\ Z T 01 Z Toi h\\ < 77/ % xaT Q ^~ sX - 

W X h,i - vV Xh ^Vv a 

The last claim is from the combination of the second and third inequalities. □ 
Proof of Theorem [T] 

Proof. Applying Lemma [7] and Lemma [U we obtain 

\\e-e*\\<\\d\\ + \\h\\ 

<(i + w dil )\\z+ oi z Tol h\\ + (w h + w d)2 )\\z Tol h\\ + \\{A T A)- 1 A T e\\ 
^ + W d ^ + { W h + W^ ^ x + T lAT 

W X h,l - W X h,2yVcr 

=W ^X + \\(A T A)- 1 A T e\\. 
It completes the proof. □ 



B. Proofs of Theorem H Theorem H and Theorem 1 
Lemma 9. For any Q G R ri X (p~ r ), we have 



p (||Q T e|| < mQWr^V^p)) >l ~ n (l)- (26) 

Proof. Since e^'s are i.i.d. centered sub-Gaussian noise with sub-Gaussian norm A, from IZhangl J2009al 
Proposition 10.2], one has that 

p {\\Q T 4 > WQMm) + 1)) < ex P {-7^2) } • 



Taking t = U(A^/\ogp), we obtain that 

P (||Q T e|| > n(||Q[| F AV^gP)) <"(-). 



which indicates the claim. □ 
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Lemma 10. For any matrix Q G R nxm , we have 



P(||Q T e||oo < «(max || || A v^og(em))) > 1 - (^) . 

Proof. Since e/s are i.i.d. centered sub-Gaussian noise with sub-Gaussian norm A, Qje is centered sub- 
Gaussian random variable with sub-Gauss i an no rm 0(||(5j '|| A) where Qj is the j th column of Q. Using 
Hoeff ding-type inequality [see IVershyninl l201lL Lemma 5.9] and the property of sub-Gaussian random 
variables, we obtain 



F{\Qje\ >t) <exp\ 1-0 



IIQ,II 2 A 2 



which indicates that 



H\\Q T 4oc > t) =P(max|Qje| > t) 



<m exp 1 1 — 



maxj ||Qj || 2 A 2 



Taking t = 0(maxj \\Qj || Ay/log (em))(the factor in front of maxj ||Qj||A\/log (em) should be large 
enough, particularly, at least \/2 times the factor in front of max . \\q \\^a' 2 ^' we nave 



P(||Q T e|| 00 > n(max||Q i ||A N /log (em))) < ( -) , 



which implies the claim. 



□ 



Lemma 11. Assume that $ is a Gaussian random matrix. With probability at least 1 — (^) — y^J> we 
have 

(27) 



z + fx T (xp* 



Z + ) T X T e\\ oc < A/2 



where A = O I Act 



-i 



\ n (Z) (^n + r - p +((n + r - p) log(p)) 1 / 4 ) y / log(em)^ . 

Proof. First from X T (Xfi*-y) = X T (I — A(A T A)~ x A T )(B (3* — c) = X T (I — A(A T A)~ x A T )(—Aa* — 
e) = -X T e, we prove the first part of the claim \\(Z + ) T X T (X p* -y)||oo = \\(Z+) T X T e\\ 00 . 

Let Zf be the j th column of Z + , I — A(A T A)~ 1 A = PP T where P G R"x(n+r-p) has ort h og0 nal 
columns, and Y = P T <$>Vp G R(™+ r -*>) xr . One can verify that Y is a Gaussian random matrix. Using 
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Eq. (3.2) in BMendelson et all 1200811 . we have 



n + r — p 



Zf\\ 2 > t\\Zl 



+ ||2 



\\{I-A{A T A)- 1 A)^V Z. 



H-II2 



n + r — p 



\Z+\\ 2 >t\\Z+ 



\PP T <$>V p Z+\\ 2 



n + r — p 

\\Yzn< 



\\z+\\ 2 >t\\z+ 



n + r — p 



\\ztf>t\\ztf 



< exp{-0((n + r - p)t 2 )} 



It follows that 



\\xz+\\ 2 

\\zn 2 



> (1 + t){n + r-p) 



< exp{-fi((n + r - p)t 2 )} 



\\xz+\\ 2 
> Wz+W 2 



max 



> (l + t)(n + r-p) 



< r exp{— fi((n + r — p)t 2 )} 



max\\XZf\\ > V(l + i)(n + r - p) max ||Z + 

3 3 



< r exp{— f2((n + r — p)t 2 )} 



> y/(l + t)(n + r-p)v^ a (Z)\ < r exp{-0((n + r - p)t 2 )} 
WXZ+W^ > y/(l + t)(n + r -p)vZL(zj\ < exp{log(r) - 0((n + r - p)i 2 )} 



Taking i = fi(ylog(p) /(n + r — p)), we obtain 



\\XZ + \\^ 2 > ^n + r-p + n((n + r-p) log(p)) i/4 ) a^Z) 
Applying Lemma [TOl we obtain 



- Q [ - 



\\(Z + fX T e\\ co >n 



) A v / log(em) 



< n ( — 

m 



||(Z+) T X T e|| 00 > (^n+r-p+fi ((n + r - p) log(r)) 1/4 ) a^{Z)Q ( 



[em 



_ n ( — j + n ( - 

ml \r 



|| (Z + ) T X T e\\ 00 > CI ( Aa^ n (Z) ( ^/n + r-p + ((n + r - p) log(p)) 1/4 Vlog(em) 



< ft ( — ) +0 f - 

\m ) \p 

which implies the claim. 
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Proof of Theorem |H 

Proof. First one can verify that for any matrices P and Q with orthogonal columns, P T &Q is a Gaussian 
random matrix. Hence A and B are Gaussian matrices. Let (I — A{A T A)~ x A T ) = PP T where P G 
gnx(n+r-p) ^as or thogonal columns. Let F C {1, ■ • ■ , m} be a index set with cardinality |F| = k. Let 
Q G W xk have orthogonal columns, whose image is the subspace spanned by columns of Z + in the index 
set F. 



max > yfn + r -p + fl(Vk) + t 

heH(z+,k) \\n\\ 

: _ ||(/-^A)-M)^| >v __ +ti(V - ) + t 

h€H(Z+,k) \\tl\\ 



max 



max 



max ■ 



lPP 2 v : Qv]l >^TT- P+ n(-rk )+t 

\\Qv\\ 



P^VrQv , n . rr, 

\\Qv\\ 



\Yv\ 



v \\v\ 



> Vn + r - p + n(Vfe) + t 



< 2exp(-fi(t 2 )). 



where the last inequality uses Theorem 5.39 llVershyninl 1201 lh Since V,(Z + , k) = Upwi ... tm y7i(Zp, k), 
we have 



max ^$ > Jn + r -p + Q(Vk) + t 
heH{z+,k) 



< 



2 exp(— Sl(i 2 )) = 2exp(A;log(em/A:) - Q(t 2 )). 



Taking t = il(y/klog(em/k)), we obtain 

WXh 



max 



heH(Z+,k) \\h\\ 



> \Jn + r — p + £l(y/k \og(em/k)) 



Similarly, we have ' 
note 



Px z+fo) > V n + r — p + £l(y/klog(em/k)) < 2exp(— f2(felog(em/fc))). 

V ^Px,z+(ty > V n + r — p — £l(y/k\og(em/k)) < 2exp(— $7(Hog(em/£;))). De 



Y = $V 



I(p— r)x(p— r) 

Q 



»nx(p-r+fc) 
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which is a Gaussian random matrix. We have 

\\[AB]h\\ 



max 

heRP- r xH{Z+,k) 



> y/n + Q,{y/k + p-r) + t 



max ^rir^ > \/n + ^(^fc +p - r) + /: 
~ r xH(Z+,k) ' ' 



max 



max 



I 


" 




u 




1 




I 


" 




u 







Q _ 




V 









Q _ 




V 





u 

V 



u 

V 



> ^fn + 9.{y/k+p-r) + t 



< 2exp(-fi(t 2 )). 



Since 



H{Z + ,k) =U Fc{1 ... M M p - r x n(Z+,k), we have 



\\[AB]h\\ 



max 

/iGRP- r xW(Z+,fc) 



> y/n + ^(^/fc+p-r) + t 



< 



k + p — r J 



j 2exp(-fi(t 2 )) < 2exp{(/c + p-r)log(ep/(A; + p-r)) -0(i 2 )}, 



Taking i = 0(y (A; + p — r) \og(ep/ (k + p — r))), we obtain the third claim. The proof of the last inequal- 
ity can be obtained similarly. □ 

Proof of Lemma Q] 

Proof. Using Eq. (TT3T ) and Eq. (fT4l ). we have 

- W Xha W a 



~-p X Z+( S + - 6 °min( Z )pi + 



"max 

(Z)cr min (Z) 
O'min(-Z') - 3 v / s77cr max (Z') 



>P x ,z+ ( s + l )~ jPx,z+ ( s + 



>(n + r—p) — ft * (n + r — p)(s + I) log 



em 
s + I 



{n + r — p) — £1 [n + r — p){s + I) log(em/Z) 



(n + r — p) — Q I 4 / (n + r — p) (s + Z) log 



em 

7+7 



holds with probability at least 1 — 2 exp{— Q((s + I) log(em/(s + I)))}. 



□ 
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Lemma 12. Assume that A £ M nx (p r ) with n > (p — r) a«<i $ /s a Gaussian random matrix. We have 

y/p-r 



\\{A 1 'A)- 1 A T \\ F < 



n — y/p — r — fi(ylogn) 



t (A T i4) < n - ft( y/n(p - r)) 





























(28) 
(29) 



Proof. Since A = and V a has orthogonal columns, we know that A 6 R nx (P r ) is a Gaussian random 
matrix. Denote cr^ (A) as the i th largest positive singular value. Then we have 



p—r 



p-r 



\\{A T A)- l A T \\l = J>" 2 (A) < J>^(A) = (p-r)a-l(A). 



i=i 



i=i 



Using Corollary 5.35 llVershyninl 120 1 111 . we have 

Pk min (A) > - Vp^ - n(v^^)] > i - ft Q V 

Hence the first claim follows from 



(30) 



y/p-r 



n 



- y/p-r - fi(Vlogn) 

VP - r 

VP — r — fi(\/Togn) 



%in(^) >Vn~ y/p-r - VL(yf\ogn) 



> 1 - ft 



The second inequality is obtained directly from Eq. (1301 with the relationship <7 m i n ( J 4 T J 4) = <r mm (A). □ 
Proof of Theorem [3] 

Proof. Let / = (10k) 2 s. First let us consider the second term of Eq. ((12) . Using Lemma|9]and Lemma [121 
we have that with the probability at least 1 — ft(p _1 ), the following holds 



UA 1 'A)- 1 A T e\\ < 



y/p — rVlogpA 



/n — yfp^r — ft(-v/logn) 
Now we consider the first term of Eq. (fl2l . Using Eq. ([TBI and Eq. (fTBT) . we derive the following: 

W d)1 =ft ^cr mm (A T yl) (p\ AB ^ z + (p-r,s + l + p-r) - P[ ABhz + (p - r,s + I + p - r) 



n 



y/ n (P 



n 



with the probability at least 1 — ft (— slog(ep/s)). Similarly, we have W^2 = ^ ( T m J n (2 



g log (ep/s) 



with the same probability. W a and are bounded by ft(cr max (Z)) and U(a m l n (Z)) respectively. From 



23 



LemmaQ} we have Wxh,i ~ Wxh,2W a > Cl(n). Now we are ready to estimate the upper bound of Wg with 
holding probability at least 1 — £2(j> _1 ) — Q(—slog(ep/s)): 



Wa =6 



(1 + Wd.QWg + {W h + W d ^Wl 



W Xh ,i ~ W Xh , 2 W a 



( 



VmaxiZ) (l + y/^^j + ^ n (Z)al ax (Z) (l + y/'-^S^j ^ 



n 



s \og(ep/s) 
n 



with probability at least 1 — Q(p x ) — f2(m 1 ). 

Finally we can express the estimate bound in Eq. (fT2l as 



Next let us consider the value of A. From Lemma [TT1 we have 

A <0 (Aa^Z) \/log{em) (yn + r-p + ((n + r-p) log(p)) 1/4 )^ 

<n(cr^ n (Z)y/n\ogm) 
<0 (a^ a (ZWnlogp) 



<tt n- 1 {a max {Z) + a-] n (Z)a 2 max (Z)) 1 + 



slog(ep/s) 



n 



n l aJ in {Z)^sn\ogp + 12 



=12 (AS + AC 2 ) 1 + 



slog(ep/s) \ slogp slogp 



+ 



s logp 



n 



with probability at least 1 — Q(p 1 ) — U(m x ) — s log(ep/s)). 
Proof of Theorem [4] 



s logp 



□ 



Proof. Denote each row of D as dT, fc = 1, • • • , m. The manner to generate d& indicates that all d^s are 
independent and E(d| i ) = | and E(dkidkj) = p (p_i) f° r anv * / J- Hence we have 



Q:=|E(44 



l _ l 



(31) 



One can verify that all p — 1 nonzero eigenvalues of Q are identical and positive. Thus, we can decompose 
Q as Q = 'jUqIIq where 7 > and Uq € M?*^ 1 ^ such that Uq has orthogonal columns. Let dk = 
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{p/2'y) 1 l 2 UQdk. It is easy to see that all c^'s are independent. We can construct D by D = (p/27) 1 / 2 DUq, 
that is, the fe* 71 row of D is d^. 



1(44) 



JL(u Q d k d T k ul) = h%QU Q 



l (p-l)x(p-l)- 



(32) 



Hence d^'s are independent isotropic random vectors. Next we can verify that c^'s are sub-Gaussian random 
vectors since each entry of d k is bounded such tha t for any fixed x G R p_1 the inner product {x,d k ) is 
bounded. From Definiti on 5.22 in yershyninl 1201111 . we know that d^s are sub-Gaussian random vectors. 
Using Theorem 5.39 in IIVershyninL 1201 ill , we obtain that with probability at least 1 — 2 exp{— Q(p)}, one 
has 



m 



G min (D)< (D)<Vrn + ^(Vp) (33) 
Note that all singular values of D are proportional to all nonzero singular values of D. Hence, we have 
which completes the proof. □ 



mx(g) 



C. Gaussian Ensembles in Vait er et afl ||2012|l 

Vaiter et al. ll2012ll studied the formulation (J2J and provided an upper bound for the estimate error, but did 



not consider the Gaussian ensembles for <I>. To study whether their result implies the consistency property, 
we assume (j) to be a Gaussian matrix in their upper bound. It turns out that their bo und did not imply 
the consistency property even in the special case D = I pX p- Recall the error bound in llVaiter et all 12012 , 
Theorem 3] : 

\\e-e*\\ < p[ T o c ]|||| e ||(||cD|| + c||z) ro || 2i00 ). 

Please refer to the original paper for definitions of J^ T o^ and || • ||2,oo- To simplify the following discussion, 
we assume D = I pX p and $ to be a Gaussian random matrix, let C = 0, and only consider the interesting 
scenario p > n > s. Note that setting C = somehow makes their r esult stronger. On e can verify that 
1 1 3>|| < 0{^/p — y/n) and ||e|| < 0(y/n) from the random matrix theory IVershyninl 11201 ill . To estimate the 
value of ||^4[ T o] ||, we have 



■ max 



: max 

u 



■ max 



: max 

u 



■ max ■ 



\Al T §- 



u 



arg mmn 



= o|||$x|| 2 - (x,u) 



arg mm x 



1 1 
=0 2 1 



<3?x| 



(x,u) 



arg mm Xl 



1 1 
=0 2 I 



®T x To\ 



{XT ,U T0 ) 



-1 



U T \ 



($t $t ; 



<o{(V^-V~sr 2 ) 

KOin- 1 ), 
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where the second line uses the result in Definition 3 llVaiter et all 1201211 and $r denotes the matrix consist- 
ing of columns of <I> in the index set T$. It follows that 



II* 



<pl T o C J|||| e ||||$|| 

<0 [rr l y/n{^/p- y/n)) 



n 



which implie s that even given n = 0(s logp), the upper bound is not guaranteed to converge to zero. Hence 
the result in llVaiter et all 1201211 does not imply the consistency property. 
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